Adaptive quantum state tomography improves accuracy quadrat ically 



D.H. Mahler,^ Lee A. Rozema/ Ardavan Darabi,^ Chris Ferrie,^ Robin Blume-Kohout,^ and A.M. Steinberg^ 

^ Centre for Quantum Information & Quantum Control and Institute for Optical Sciences, 
Dept. of Physics, 60 St. Ceorge St., University of Toronto, Toronto, Ontario, Canada MBS lA'^ 
^Institute for Quantum Computing and Department of Applied Mathematics, University of Waterloo 

^Sandia National Laboratories 
(Dated: March 5, 2013) 

We introduce a simple protocol for adaptive quantum state tomography, which reduces the worst- 
case infidelity (1 — F(p,p)) between the estimate and the true state from 0(1/a/]V) to 0{1/N). 
It uses a single adaptation step and just one extra measurement setting. In a linear optical qubit 
experiment, we demonstrate a full order of magnitude reduction in infidelity (from 0.1% to 0.01%) 
for a modest number of samples (N ^ 3 x 10^). 

PACS numbers: 42.50.Dv,42.50.Xa 



Quantum information processing requires reliable, re- 
peatable preparation and transformation of quantum 
states. Quantum state tomography is used to identify 
the density matrix p that was prepared by such a pro- 
cess. No finite ensemble of N samples is sufficient to 
uniquely identify p, so we estimate it, reporting either 
a single state p that is "close" to p with high probabil- 
ity [IHS], or a confidence region of nonzero radius that 
contains p with high probability |6| . Both approaches 
must accept some inaccuracy (the discrepancy between 
p and p) or imprecision (the diameter of the confidence 
region). This can be quantified in many ways (e.g., trace 
norm, fidelity, relative entropy, etc.), but the universal 
goal of state tomography is to minimize it. In this paper, 
we consider a well-motivated (see concluding discussion) 
and popular measure of a point estimator's inaccuracy, 
quantum infidelity^ 

l-F{p,p) = l-Tr{^^p^)^, (1) 

and its scaling as A/" ^ oo. 

First, we show that the infidelity of standard tomog- 
raphy with static measurements can't beat 1 — F = 
0(1/a/]V) as a/" ^ oo for a large and important class of 
states. Second, we introduce a simple adaptive protocol 
that achieves 1 — F = 0{1/N) for every state, and explain 
why it works. Finally, we demonstrate this effect in a 
linear optical experiment, and achieve a 10-fold improve- 
ment in infidelity (from 0.1% to 0.01% with A^ = 3 x 10"^ 
measurements) over standard tomography. 

Adaptivity has been proposed in various contexts (e.g., 
Ref. pi recently treated state estimation as parameter es- 
timation, obtaining a result complementary, but largely 
orthogonal, to those reported here). Single-step adap- 
tive tomography was first analyzed by [9l, and later re- 
fined in [IOH12]. Here, we present a simple, self-contained 
derivation of: (1) why quantum fidelity is significant; (2) 
why adaptive tomography achieves far better infidelity; 
and (3) how the adaptation should be done. In contrast 
to prior work, we optimize worst- case infidelity, rather 



than average infidelity. This allows us to achieve high 
accuracy for all states, whereas previous approaches (e.g. 
|TT] ) yielded low accuracy on a small but important set 
of states (see concluding discussion). 



ADAPTIVE TOMOGRAPHY 

Static tomography uses data from a fixed set of mea- 
surements. Different measurements yield subtly different 
tomographic accuracy jSl, but to leading order, "good" 
protocols for single-qubit tomography provide equal in- 
formation [14 about every component of the unknown 
density matrix p, 

p = ^ (11 + (cr^) + (ay) ay + (a^) cr^) . (2) 

The canonical example (which we consider hereafter) is 
to measure the three Pauli operators (a^;, a^y, a^). This 
minimizes the variance of the estimator p - but not the 
expected infidelity, for two reasons. 

First, the variance of the estimate p depends also on 
p itself. Consider the linear inversion estimator pun, 
defined by estimating (a^) = ^^^^^ (and similarly for 
(ax) and (cr^)), and substituting into Eq. [2] Each 
measurement behaves like N/3 flips of a coin with bias 
Pfe = ^(1 + W)), and yields 

Pk =Pk^ )J^VPk{^-Pk) (3) 

^ K)estimated = K)true ± ^^^l-K)^ • (4) 

When (cT/c) ~ 0, its estimate has a large variance - but 
when (<j/c) ^ ±1, the variance is very small. As a result, 
the variance of p around p is anisotropic and p-dependent 
(see Fig. [l^). 

Second, the dependence of infidelity on the error, A = 
p — p, also varies with p. Infidelity is hypersensitive to 



2 



misestimation of small eigenvalues. A Taylor expansion 
of 1 — F{p^p) yields (in terms of p's eigenbasis {|^)}), 

l-f(p,p + eA) = lV ... 1^'^';'^' ,., +Q(A3). (5) 
4f^{i\p\t) + {j\p\j) 

Infidelity is quadratic in A - except that as an eigen- 
value (^1 p 1^) approaches 0, its sensitivity to variations in 
(z| A |i) diverges, and 1 — F becomes linear [15^ in A: 

1-F(p,p + 6A) = 6 (^|AK)+0(A2). (6) 

i: {i\p\i)=Q 

So to minimize infidelity, we must accurately estimate 
the small eigenvalues of p, particularly those that are (or 
appear to be) zero. For states deep within the Bloch 
sphere, static tomography achieves infidelity of 0{1/N) 
[9, 16 . Typical errors scale as |A| = 0(1/ ^N) (Eq. [4|, 
and infidelity scales as 1 — F — 0(|Ap). But for states 
with eigenvalues less than 0(1/a/]V), infidelity scales as 
0(1/a/]V). Since quantum information processing relies 
on pure and nearly-pure states, it is this poor scaling 
(rather than the OiXjN^ scaling for highly mixed states) 
that is significant. 




FIG. 1. Two features of qubit tomography with Pauli mea- 
surements (shown for an equatorial cross-section of the Bloch 
sphere) : (a) The distribution or "scatter" of any unbiased es- 
timator p (depicted by dull red ellipses) varies with the true 
state p (bright red dots at the center of ellipses); (b) The ex- 
pected infidelity between p and p as a function of p. Within 
the Bloch sphere, the expected infidelity is 0(1/A^). But in 

a thin shell of nearly-pure states (of thickness O (i/Viv)), it 

scales as O [\l\fN^ - except when p is aligned with a mea- 
surement axis (Pauli X, Y, or Z). 

To achieve better performance, we observe that if p is 
diagonal in one of the measured bases (e.g., a^), then 
infidelity always scales as 0{1/N) - no matter what p's 
eigenvalues are. The increased sensitivity of 1 — F to 
error in small eigenvalues (Eq. [5| is precisely canceled 
by the reduced inaccuracy that accompanies a highly bi- 
ased measurement-outcome distribution (Eq. [4|. This 
suggests an obvious (if naive) solution: we should simply 



choose one of our measurement bases to be the diagonal 
basis of p\ 

Of course, this is impossible - knowing p would ren- 
der tomography pointless. But we can perform standard 
tomography on TVq < ^ samples, get a preliminary esti- 
mate po, and measure the remaining N — Nq samples in 
a frame where one basis diagonalizes pQ. This measure- 
ment will not quite exactly diagonalize p, but if A/'o ^ 1 
it will be fairly close. The angle between the eigenbases 
of p and po is 0(|A|) = 0{l/y/No). This implies that if p 
has an eigenvector {ipk) with eigenvalue Xk = 0, then the 
probability of the corresponding measurement outcome 
|^fc)(0/c| win be at most pk = sin^ 6> ^ 0'^ = 0(l/7Vo). 
Since we make this measurement on 0{N — Nq) copies 
[17], the final error in the estimated pk (and therefore 
in the eigenvalue A^) is 0(1/ ^s/No{N — Nq)). So using a 
constant fraction Nq = aN of the available samples for 
the preliminary estimation should yield 0{1/N) infidelity 
for all states. 

A very similar protocol was suggested by Bagan et al in 
Ref. [TT]. However, that analysis concluded that A'o ^ 
for p > I would be sufficient. This only applies 
to average infidelity when p is drawn from a particular 
ensemble. Our analysis shows that this choice yields 1 — 
F = 0{N~^^^) for almost all nearly-pure states. 

SIMULATION RESULTS 

To confirm the theory, we did numerical simulations of 
single-qubit tomography using four different protocols: 
(1) standard fixed- measurement tomography; (2) adap- 
tive tomography with Nq = A/"^/^, as proposed in [TT]: 
(3) adaptive tomography with A^o = c^N (for a range of 
a); and (4) "known basis" tomography, wherein we cheat 
by adjusting our measurement frame for all N samples 
to align with p's eigenbasis. We simulated many true 
states p, but present only the most interesting case, a 
pure state at 45 degrees to both the ax and axes. Our 
results are not particularly sensitive to the exact estima- 
tor used; we used maximum-likelihood estimation (MLE) 
with a quadratic approximation to the negative loglikeli- 
hood function: 

where fk = rik/Nj^ are the observed frequencies of the 
+1 eigenvectors of the three Pauli operators a/e, is 
the corresponding projector, and Nk is the number of 
samples on which was measured. Convex optimization 
(in MATLAB [18]) was used to find Pmle- Results were 
averaged over many (typically 150) randomly generated 
measurement records. 

Figure |2] shows average infidelity versus N. We fit these 
simulated data to power laws of the form 1 — F = (SN^, 



3 



and found p = —0.513 ± 0.006 (for static tomogra- 
phy), p = —0.868 ± 0.008 (for adaptive tomography with 
A^o = A^^/^), p = -0.980 ± 0.006 (for adaptive tomog- 
raphy with A^o = 0.57V), and p = -0.993 ± 0.09 (for 
known-basis tomography). These results are not signifi- 
cantly different [19] from predictions of the simple theory 
{p = — ^,— 1,1, and 1, respectively). The borderline- 
significant discrepancy is, we believe, due to boundary 
effects (pmle is constrained to be positive) that aren't 
modeled in the simple theory. We also varied a = Nq/N 
(Fig. pi inset) and found that ol = \ optimizes the pref- 
actor (p). 



State Preparation 



Projective IVIeasurement 



-2 
-2.5 

-3 
-3.5 

i -4 

o -4.5 
-5 

-5.5 
-6 

-6.5 




# - Standard Tomography 

• - Adaptive Tomography (Nq=N^'^) 

# - Adaptive Tomography (Nq= N/2) 

• - Known State Tomography 



4 

log,o(N) 



FIG. 2. Average infidelity 1 — F{p,p) vs. sample size N for 
Monte Carlo simulations of four different tomographic pro- 
cotocols: standard tomography (black), the procedure pro- 
posed in [11 using No = A^^/^ (red), our procedure using 
A'o = N/2 (blue), and "known basis" tomography (green). 
Both adaptive procedures clearly outperform static tomog- 
raphy, but our procedure clearly outperforms the A'o = N'^^^ 
approach, and matches the asymptotic scaling of known-basis 
tomography. The inset shows the dependence of the prefactor 
(/3) on a = No/N. 



EXPERIMENTAL RESULTS 

We implemented our simple adaptive protocol experi- 
mentally in linear optics. Using type-1 spontaneous para- 
metric down conversion in a nonlinear crystal, photon 
pairs were created. One of these photons was sent imme- 
diately to a single photon counting module (SPCM) to 
act as a trigger. The second photon was sent through a 
Glan-Thomson polarizer to prepare it in a state of very 
pure linear polarization. Waveplates were first used to 
prepare the polarization state of the photon, and subse- 
quently used in tandem with a polarization beamsplitter 
to project onto any state on the Bloch sphere. They are 
computer-controlled, enabling changes during the exper- 
iment. 




FIG. 3. Spontaneous parametric downconversion is per- 
formed by pumping a nonlinear BBC crystal with linearly 
polarized light. One photon is sent directly to a detector as 
a trigger. A rotation using a quarter-half waveplate combi- 
nation prepares the other photon in any desired polarization 
state. Finally, a projective measurement onto any axis of 
the Bloch sphere is performed by a quarter-half waveplate 
combination followed by a polarizing beamsplitter. The mea- 
surement waveplates are connected to a computer to enable 
adaptation. 



Our "standard tomography" protocol involved collect- 
ing N/3 photons at each of three measurement settings 
corresponding to a^^, a^, and a^, and computing Pmle as 
outlined in [20 . Each data point in figure |4] represents 
an average over many (~ 150) repetitions. The "true 
state" p was determined from one very long tomographic 
experiment in which N = 10^ photons were collected. 
The overwhelming size of this dataset ensures accuracy 
sufficient to calibrate the other experiments, all of which 
involve A^ < 3 x 10^ photons. 

To do adaptive tomography, we measure A^o of the pho- 
tons first and use the outcomes to generate an ML esti- 
mate po. We then rotate the measurements so that one 
coincides with the eigenbasis of the preliminary estimate. 
Finally, we measure the remaining N — Nq photons in this 
new set of bases and construct a final ML estimate of the 
state from all the data collected in both phases. 

Figure [4] compares standard tomography to adaptive 
tomography with A^o = N'^^^ and with Nq = N/2. We 
fit a power law (1 — F = (SN^) to the average fidelity of 
each protocol, and found p = —0.51 ± 0.02 for standard 
tomography, p = —0.71 ±0.04 for the procedure of Ref. 
[11], and p = —0.90 ± 0.04 for our adaptive procedure. 

Data confirm that adaptive tomography outperforms 
standard tomography by an order of magnitude even for 
modest (~ 10^) A^. Our data generally agree with theory, 
but some experiments (specifically, those that achieve 
very low infidelities) show small but statistically signifi- 
cant discrepancies with theory. Infidelities as low as 10~^ 
are dominated by systematic errors not modeled in the 
theory. We believe the primary source of systematic er- 
rors is imperfections in the waveplates and their angles - 
in simulations, fluctuations on the order of 10~^ radians 



4 



are sufficient to reproduce the observed deviations. 



-2.5 



-4.5 




# - Standard Tomography 

# - Adaptive Tomography (Nq= N^^^) 

# - Adaptive Tomography (Nq= N/2) 



2.5 



3.5 
logio(N) 



4.5 



FIG. 4. Experimental data: The average infidehty 1 — F(p, p) 
for the three tomographic protocols shown in Fig. [2] vs. the 
number of samples N . Each average is over 150 different real- 
izations of the experiment; error bars are standard deviation 
of the mean of these samples. 

Finally, we devised and performed an even simpler 
adaptive procedure. After using standard tomography 
on A^o = Y samples, to get a preliminary estimate po, 
we measured all of the remaining N — Nq samples in 
the diagonal basis of po- Fig. |5] shows the results; 
this reduced adaptive tomography procedure achieves the 
same 0(;^) infidelity. The best fits to the exponent p 
in 1 - F = (3NP are p = -0.51 ± 0.02 for standard to- 
mography and p = —0.88 ± 0.05 for reduced adaptive 
tomography (not significantly different from the results 
shown in Fig. [4|. Reduced adaptive tomography only 
requires one extra measurement setting (full adaptive to- 
mography requires three). These procedures generalize 
to higher dimensional systems, where the advantage of 
reduced adaptive tomography over adaptive tomography 
is even greater. 



DISCUSSION 

We demonstrated two easily implemented adaptive to- 
mography procedures that achieve 1 — F(p, p) = 0(1 /N) 
for every qubit state. In contrast, any static tomog- 
raphy protocol will yield infidelity 0{1/\/N) for most 
nearly-pure states. These adaptive schemes require only 
marginally more resources than standard tomography 
(just one more measurement setting for the reduced 
scheme). We see almost no reason not to use reduced 
adaptive tomography in future experiments. 

The 0{1/N) infidelity scaling achieved by our scheme 
is optimal, but the constant can surely be improved - i.e., 
if our scheme has asymptotic error a/A^, a more sophis- 



• - Standard Tomography 

• - Reduced Adaptive Tomography 




2.2 2.4 2.6 



3 3.2 3.4 

logio(N) 



FIG. 5. Experimental data: standard tomography compared 
to reduced adaptive tomography. Average infidelity 1 — F(p, p) 
for standard tomography (black) and reduced adaptive to- 
mography (blue) is plotted versus A. Each average is over 
200 different realizations of the experiment; error bars are 
standard deviation of the mean of these samples. 



ticated scheme can achieve a' /N with a' < a. The ab- 
solutely optimal protocol requires joint measurements on 
all A" samples ^21j, and while it still suffers error 0{1/N)^ 
this [unknown] protocol will outperform any local mea- 
surement. Even within local protocols, there is undoubt- 
edly some marginal benefit to adapting more than once. 
What we have shown is that a single adaptation is suffi- 
cient for optimal worst-case scaling. 

Closely related previous work [11 optimized average fi- 
delity over a specific measure. They chose to average over 
Bures measure, a very respectable choice [22H24] . Unfor- 
tunately, the "hard-to-estimate" states lie in a thin shell 
at the surface of the Bloch sphere, and as A' ^ oo, this 
shell's Bures measure vanishes. So although the scheme 
with A'o cx A"^/^ proposed in [11^ achieves Bures-average 
infidelity 0(1 /N), it achieves only 0(1 /N^^^) infidelity 
for nearly all of the (important) nearly-pure states. 

Ironically, while the "hard-to-estimate" states are all 
nearly pure, restricting the problem to pure states (e.g., 
via a Bayesian prior supported only on pure states) 
falsely trivializes it - the average and worst-case risk 
drops to 0(1 /N) even with static tomography! The crit- 
ical difficulty is not in estimating which pure state we 
have (unitary errors don't impact fidelity very much). It 
is in distinguishing between small eigenvalues - telling 
the difference between A = and A = 1/ y/N . 

We conclude with an observation that may surprise: 
adaptivity provides no advantage at all if inaccuracy is 
measured by trace-norm or 2-norm, which aren't hyper- 
sensitive to small variations in small eigenvalues. This 
does not undermine our result - it has a simple explana- 
tion. Trace-norm (\p — p\i) quantifies single-shot distin- 
guishability. When A" ^ 1 samples are available, it be- 



5 



comes irrelevant. The relevant quantity is \p^^ — 
whose behavior is defined by the Chernoff bound [25^, 
which in turn is well approximated by infidelity. So 
infidelity is a measure of many- copy distinguishability. 
Since tomography is necessarily concerned with N ^ 1 
copies, the advantages of adaptivity are real, and hold 
for all many-copy metrics (e.g., relative entropy, Cher- 
noff bound, etc.) 

DHM, LAR, AD, and AMS thank NSERC, CIFAR, 
FATR, CAF, and QuantumWorks for support. Addi- 
tional thanks go to Alan Stummer for designing the co- 
incidence circuit. RBK was supported by the LDRD 
program at Sandia National Laboratories, a multi- 
program laboratory operated by Sandia Corporation, a 
wholly owned subsidiary of Lockheed Martin Corpora- 
tion, for the U.S. Department of Energy's National Nu- 
clear Security Administration under contract DE-AC04- 
94AL85000. 



* |dmah ler@physics.utoronto.ca| 
[1] Z. Hradil, Phys. Rev. A 55, 1561 (1997). 
[2] M. Paris and J. Rehacek, Quantum state estimation, Vol. 

649 (Springer, 2004). 
[3] R. Blume-Kohout, N. J. Phys. 12, 043034 (2010). 
[4] R. Blume-Kohout, Phys. Rev. Lett. 105, 200504 (2010). 
[5] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and 

J. Eisert, Phys. Rev. Lett. 105, 150401 (2010). 
[6] M. Christandl and R. Renner, Phys. Rev. Lett. 109, 

120403 (2012). 
[7] R. Blume-Kohout, arXiv: 1202.5270 (2012). 
[8] R. Okamoto, M. lefuji, S. Oyama, K. Yamagata, H. Imai, 



[9; 
[lo; 

[11 
[12; 



A. Fujiwara, and S. Takeuchi, Phys. Rev. Lett. 109, 
130404 (2012). 



R. D. Gill and S. Massar, |Phys. Rev. A 61, 042312 (2000) 
E. Bagan, M. Ballester, R. D. Gill, ATMonras, anc 
R. Munoz-Tapia, Phys. Rev. A 73, 032301 (2006). 

E. Bagan, M. Ballester, R. Gill, R. Munoz-Tapia, and 
O. Romero-Isart, Phys. Rev. Lett. 97, 130501 (2006). 

F. Huszar and N. M. T. Houlsby, |Phys. Rev. A 85, 052120] 
(2012) 

and 



M. D. de Burgh, N. K. Langford, A. C. Doherty, 
A. Gilchrist, Phys. Rev. A 78, 052122 (2008). 
[14] A. J. Scott, J. Phys. A 39, 13507 (2006). 
[15] Because p lies on the state-set's boundary, the gradient 
of F need not vanish in order for p = p to be a local 
maximum. 

[16] T. Sugiyama, P. S. Turner, and M. Murao, N. J. Phys. 

14, 085005 (2012). 
[17] The "O" notation is necessary here because some of the 
remaining N — No copies may be measured in other bases 
that make up a complete measurement frame. 
[18] J. Lofberg, Proc. CACSD (Taipei) (2004). 
[19] All quoted uncertainties herein are la, or 68% confidence 
intervals. Therefore, we don't expect the the "true" value 
to lie within the error bars more than 68% of the time. 
Most of the results given here agree with theoretical pre- 
dictions to within 2a (95% confidence intervals), a com- 
mon criterion for consistency between data and theory. 
[20] D. F. V. James, P. G. Kwiat, W. J. Munro, and A. G. 
White, Phys. Rev. A 64, 052312 (2001) 



[21] S. Massar and S. Popescu, |Phys. Rev. Lett. 74, 1259 
^ (1995) 

[22] M. Hiibner, Phys. Lett. A 163, 239 (1992). 
[23] D. Petz and C. Sudar, J. Math. Phys 37, 2662 (1996). 
[24] K. Zyczkowski and H.-J. Sommers, Phys. Rev. A 71, 
032313 (2005). 

[25] K. Audenaert, J. Calsamiglia, R. Munoz-Tapia, 
E. Bagan, L. Masanes, A. Acin, and F. Verstraete, Phys. 
Rev. Lett. 98, 160501 (2007). 



